Organization

Two types of video-driven Cloning

Network for Poses-to-Video Translation
Our main goal is to learn how to convert logical sequences of 2D human poses into video frames that show a person that looks a lot like a target actor in a reference video that has been provided. By taking human poses from different driving videos and sending them into the translator, such a translator might theoretically be used to clone random video performances of other people. Measures of Positivity
As was previously said, the degree to which the poses in the driving video resemble the poses in the paired training data that was taken from the reference video has a significant impact on the visual quality of the frames produced by our network. Every driving stance does not have to, however, exactly resemble a single pose in the training set. The utilization of a PatchGAN discriminator allows our network to create a frame by aggregating local patches observed throughout training. Therefore, it is important to quantify the similarity between driving and reference postures in a way that is both local and translation-invariant. In this section, we propose a pose metrics that attempt to measure similarity in such a manner.

Cloning

Organization Description